Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers
نویسندگان
چکیده
Although various techniques have been proposed to generate adversarial samples for white-box attacks on text, little attention has been paid to black-box attacks, which are more realistic scenarios. In this paper, we present a novel algorithm, DeepWordBug, to effectively generate small text perturbations in a black-box setting that forces a deep-learning classifier to misclassify a text input. We introduce novel scoring strategies to find the most important tokens to modify such that the classifier will make a wrong prediction. Simple character-level transformations are applied to the highest-ranked tokens in order to minimize the edit distance of the perturbation, yet change the original classification. We evaluated DeepWordBug on eight real-world text datasets, including text classification, sentiment analysis and spam detection. We compare the result of DeepWordBug with two baselines: Random (Black-box) and Gradient (White-box). Our experimental results indicate that DeepWordBug leads to a decrease from the original classification accuracy up to 63% on average for a Word-LSTM model and up to 46% on average for a Char-CNN model, both of which deeplearning models are state-of-the-art.
منابع مشابه
DANCin SEQ2SEQ: Fooling Text Classifiers with Adversarial Text Example Generation
Machine learning models are powerful but fallible. Generating adversarial examples inputs deliberately crafted to cause model misclassification or other errors can yield important insight into model assumptions and vulnerabilities. Despite significant recent work on adversarial example generation targeting image classifiers, relatively little work exists exploring adversarial example generation...
متن کاملImprovement of generative adversarial networks for automatic text-to-image generation
This research is related to the use of deep learning tools and image processing technology in the automatic generation of images from text. Previous researches have used one sentence to produce images. In this research, a memory-based hierarchical model is presented that uses three different descriptions that are presented in the form of sentences to produce and improve the image. The proposed ...
متن کاملGeneric Black-Box End-to-End Attack Against State of the Art API Call Based Malware Classifiers
Deep neural networks (DNNs) are used to solve complex classification problems, for which other machine learning classifiers, such as SVM, fall short. Recurrent neural networks (RNNs) have been used for tasks that involves sequential inputs, such as speech to text. In the cyber security domain, RNNs based on API calls have been used effectively to classify previously un-encountered malware. In t...
متن کاملBlocking Transferability of Adversarial Examples in Black-Box Learning Systems
Advances in Machine Learning (ML) have led to its adoption as an integral component in many applications, including banking, medical diagnosis, and driverless cars. To further broaden the use of ML models, cloud-based services offered by Microsoft, Amazon, Google, and others have developed ML-as-a-service tools as black-box systems. However, ML classifiers are vulnerable to adversarial examples...
متن کاملCascade Adversarial Machine Learning Regularized with a Unified Embedding
Deep neural network classifiers are vulnerable to small input perturbations carefully generated by the adversaries. Injecting adversarial inputs during training, known as adversarial training, can improve robustness against one-step attacks, but not for unknown iterative attacks. To address this challenge, we propose to utilize embedding space for both classification and low-level (pixel-level)...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1801.04354 شماره
صفحات -
تاریخ انتشار 2018